{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Bayesian PCA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from sklearn.datasets import load_iris"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%config InlineBackend.figure_format = \"retina\"\n",
    "np.set_printoptions(precision=4, suppress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* A fully probabilistic approach to PCA allows us to automatically choose the dimensionality of the principal subspace\n",
    "* In this notebook, we consider a model in which only $\\bf W$ has a prior distribution of the form\n",
    "\n",
    "$$\n",
    "    p({\\bf W}\\vert\\boldsymbol\\alpha) = \\prod_{m=1}^M \\left(\\frac{\\alpha_m}{2\\pi}\\right)^{D/2} \\exp\\left(-\\frac{\\alpha_m}{2}{\\bf w}_m^T{\\bf w}_m\\right)\n",
    "$$\n",
    "\n",
    "To choose the values of $\\{\\alpha_m\\}_{m=1}^M$, we maximize the marginal likelihood once $\\bf W$ has been integrated out. That is, we want to maximize \n",
    "\n",
    "$$\n",
    "    p({\\bf X}\\vert\\boldsymbol\\mu, \\boldsymbol\\mu, \\sigma^2) = \\int\\prod_{m=1}^M \\mathcal{N}({\\bf w}_m\\vert {\\bf 0}, \\alpha_m{\\bf I}) \\cdot \\mathcal{N}({\\bf x}_n \\vert \\boldsymbol\\mu,{\\bf C}) \\ \\text{d}{\\bf W} \\label{eq:a}\\tag{1}\n",
    "$$\n",
    "\n",
    "Where\n",
    "* ${\\bf C} = {\\bf W}{\\bf W}^T + \\sigma^2{\\bf I}$\n",
    "\n",
    "\n",
    "The integral over $\\eqref{eq:a}$ is intractable, thus we make use of the Laplace approximation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "iris = load_iris()\n",
    "X_iris, y_iris = iris[\"data\"], iris[\"target\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "N, D = X_iris.shape\n",
    "M = 2\n",
    "sigma2 = 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "W = np.random.randn(D, M)\n",
    "S = np.cov(X_iris.T)\n",
    "C = W @ W.T + np.eye(D) * sigma2\n",
    "alpha = np.random.rand(M)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rewriting the equation $\\sum_{m}\\alpha_m{\\bf w}^T_m{\\bf w}_m$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.1233544857254705"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.einsum(\"m,jm,jm\",alpha, W, W)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.123354485725471"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.trace(alpha * np.identity(M) @ W.T @ W)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "7.123354485725471"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.trace(W @ (alpha * np.identity(M)) @ W.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[  9.24  , -18.6294],\n",
       "       [ -0.6708,  24.698 ],\n",
       "       [ -4.0795,  -1.6487],\n",
       "       [  2.0436, -21.8892]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.linalg.inv(S @ np.linalg.inv(C) - np.eye(D)) @ C @ W"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = np.random.randn(5, 5)\n",
    "B = np.random.randn(5, 5)\n",
    "C = np.random.randn(5, 5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3.151859574213212"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.trace(A @ B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3.151859574213212"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.einsum(\"in,ni->\", A, B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-6.112247548953237"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.trace(A @ B @ A.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-6.112247548953242"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.einsum(\"nm,ml,nl->\", A, B, A)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "    \\frac{\\partial}{\\partial A_{ij}}\\text{Tr}(ABA^T) = \\left(A\\left[B^T + B\\right]\\right)_{ij}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 3.3534, -2.5422,  1.4369,  4.3057,  1.789 ],\n",
       "       [ 0.0697, -7.2008,  4.1166, -1.0726, -1.6574],\n",
       "       [-0.2658,  0.7849, -3.2606, -2.8398, -1.8644],\n",
       "       [ 6.6364, -0.1028, -4.5675, 16.3631,  4.7665],\n",
       "       [-4.1316, -3.3403, -0.5746, -8.3594, -5.8196]])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "A @ (B.T  + B)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 3.3534, -2.5422,  1.4369,  4.3057,  1.789 ],\n",
       "       [ 0.0697, -7.2008,  4.1166, -1.0726, -1.6574],\n",
       "       [-0.2658,  0.7849, -3.2606, -2.8398, -1.8644],\n",
       "       [ 6.6364, -0.1028, -4.5675, 16.3631,  4.7665],\n",
       "       [-4.1316, -3.3403, -0.5746, -8.3594, -5.8196]])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.einsum(\"il, jl\", A, B) + np.einsum(\"il,lj\", A, B)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "    \\text{Tr}\\left(ABA^TC\\right)\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-11.805289028013819"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.trace(A @ B @ A.T @ C)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "-11.805289028013789"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.einsum(\"nm,ml,ol,on->\", A, B, A, C)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$\n",
    "    \\frac{\\partial}{\\partial A_{ij}}\\text{Tr}\\left(ABA^TC\\right) = \\left(C^T A B^T + C A B\\right)_{ij}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 2.5339, -2.6197,  1.0428, 14.9636,  6.5248],\n",
       "       [-0.7265, -4.1606,  0.5044, 13.231 ,  3.9848],\n",
       "       [-0.6543, -2.3858, -1.6314,  4.5779,  0.9784],\n",
       "       [-2.7903, -1.6052, -1.1076,  4.1098, -0.1175],\n",
       "       [-7.2523, -5.1329,  3.7958, -8.5381, -8.1382]])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.einsum(\"jn,mn,mi->ij\", B, A, C)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 2.5339, -2.6197,  1.0428, 14.9636,  6.5248],\n",
       "       [-0.7265, -4.1606,  0.5044, 13.231 ,  3.9848],\n",
       "       [-0.6543, -2.3858, -1.6314,  4.5779,  0.9784],\n",
       "       [-2.7903, -1.6052, -1.1076,  4.1098, -0.1175],\n",
       "       [-7.2523, -5.1329,  3.7958, -8.5381, -8.1382]])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(C.T @ A @ B.T)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 4.8074, -5.2833,  2.6995,  0.5159,  2.0417],\n",
       "       [ 3.7879, -4.2632, -1.4116,  2.4091, -0.8741],\n",
       "       [-7.598 ,  5.1263, -0.5171, -5.5249, -0.9244],\n",
       "       [ 7.2606, -3.6446, -0.8113,  7.3453,  1.4012],\n",
       "       [-2.4637,  0.7962, -1.827 , -2.6136, -2.1997]])"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.einsum(\"nm,mj,in->ij\", A, B, C)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 4.8074, -5.2833,  2.6995,  0.5159,  2.0417],\n",
       "       [ 3.7879, -4.2632, -1.4116,  2.4091, -0.8741],\n",
       "       [-7.598 ,  5.1263, -0.5171, -5.5249, -0.9244],\n",
       "       [ 7.2606, -3.6446, -0.8113,  7.3453,  1.4012],\n",
       "       [-2.4637,  0.7962, -1.827 , -2.6136, -2.1997]])"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "C @ A @ B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sympy\n",
    "i, j, n, m, l, o = sympy.symbols(\"i j n m l o\")\n",
    "D = sympy.symbols(\"D\")\n",
    "A, B, C = sympy.symbols(\"A B C\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# References \n",
    "\n",
    "1. https://papers.nips.cc/paper/1549-bayesian-pca.pdf\n",
    "2. https://www.cs.toronto.edu/~rsalakhu/STA4273_2015/notes/Lecture8_2015.pdf\n",
    "3. https://haralick.org/ML/Neural_Networks_for_Pattern_Recognition_Christopher_Bishop.pdf\n",
    "4. http://www.miketipping.com/papers/met-mppca.pdf\n",
    "5. https://www.apps.stat.vt.edu/leman/VTCourses/PPCA.pdf"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
